† Corresponding author. E-mail:
To achieve de novo protein structure determination of challenging cases, multi-wavelength anomalous diffraction (MAD) and multiple isomorphous replacement (MIR) phasing can be powerful tools to obtain low-resolution initial phases from heavy-atom derivative datasets, then phase extension is needed against high-resolution data to obtain accurate structures. In this context, we propose a direct-methods procedure here that could improve the initial low-resolution MAD/MIR phase quality. And accordingly, an automated process for extending initial phases to high resolution is also described. These two procedures are both implanted in the newly released IPCAS pipeline. Three cases are used to perform the test, including one set of 4.17 Å MAD data from a membrane protein and two sets of MAD/MIR data with derivatives truncated down to 6.80 Å and 6.90 Å, respectively. All the results have shown that the initial phases generated from the direct-methods procedure are better than that from the conventional MAD/MIR methods. The automated phase extensions for the latter two cases starting from 6.80 Å to 3.00 Å and 6.90 Å to 2.80 Å are proved to be successful, leading to complete models. This may provide convenient and reliable tools for phase improvement and phase extension in difficult low-resolution tasks.
Experimental phasing methods are the major choices for de novo protein structure determinations in x-ray crystallography. In these methods, phasing process starts with finding the substructures in the asymmetric unit of derivative crystals and the initial phases are calculated via different types of techniques, such as single-wavelength anomalous diffraction (SAD), single isomorphous replacement (SIR), multiwavelength anomalous diffraction (MAD), multiple isomorphous replacement (MIR), SIR with anomalous scattering (SIRAS), and MIR with anomalous scattering (MIRAS).[1] As the phase ambiguity is intrinsic in SAD/SIR, the MAD/MIR phasing is more powerful for combining the bimodal phase distribution of several SAD/SIR data sets to give the unique phase indication for each individual reflection. Programs such as SHARP, SOLVE, and MLPHARE are all available to obtain the phase information via conventional MAD/MIR methods.[2–4] In some challenging cases of membrane proteins or large complexes, the intrinsic disorder or flexibility often results in low resolution[5] (say about 6.00 Å lower) in the heavy-atom derivative data, making it difficult for crystallographers to obtain the initial phases. Even under this condition, MAD and MIR are still reliable and can provide useful information for the structure determination.
However, MAD and MIR have their own limitations: for MIR, the isomorphism between the native and the derivatives is not always perfect; for MAD, the severe radiation damages may lead to weak anomalous signals in the individual SAD data sets. These problems can be even serious when dealing with cases from poorly diffracted crystals, making the phases interpretation ambiguous in high-resolution and the good-quality phases only available at lower resolution.[6] Therefore, we propose here an iterative direct-methods MAD/MIR phasing procedure in order to eliminate the effects of these problems. It integrates the iterative direct-methods SAD/SIR phasing[7,8] with the conventional MAD/MIR methods and testing results with three cases at low resolution have shown this hybrid-procedure is better than using single method alone.
During our test, we find that the resulting low-resolution initial phases obtained by the hybrid procedure mentioned above are far from sufficient to generate an interpretable electron density map from either manual or automated model building. Phase extension is therefore becoming an essential part in the following process. In addition, prior knowledge of the structure such as the non-crystallographic symmetry (NCS), homologous template, or multi-crystal information is often needed in difficult phase extension cases.[9,10] And when the prior structure knowledge is unavailable, the automated phase extension would become a grand challenge.
There are many different methods for conducting the phase extension procedure, such as the solvent-flattening,[11] histogram matching,[12] the maximum-entropy method,[13] and the maximum likelihood density modification.[14] In practice, all the methods above were incorporated with the solvent-flattening technique within a dual-space framework to improve their own efficiency. We present here an iterative direct-methods phase extension procedure by modifying the iterative direct-methods-aided partial-model extension proposed in 2007.[15] This method differs from all the above in that it uses direct methods to provide phase constraint in the reciprocal space and, in addition, it is independent of prior structure knowledge. Two testing cases with starting phases at 6.90 Å and 6.80 Å have been successfully extended to native data at resolutions of 2.80 Å and 3.00 Å, respectively, and the final resulting models agree well with the previous entry structures in Protein Data Bank (PDB;
The method consists of two stages, the first stage is to obtain the low-resolution initial phases via a direct-methods MAD/MIR phasing procedure, while the second stage is to extend the low-resolution phases to higher resolution automatically. The details of these two stages will be described in the following sections.
This procedure is featured as follows.
(i) Phases are first calculated by the conventional MAD/MIR methods;
(ii) MAD/MIR phases from step (i) are cut off to low resolution and used as the starting “known phases” for the direct-methods SAD/SIR phasing. For details of direct methods SAD/SIR phasing, please refer to Refs. [7] and [8].
(iii) Among the MAD/MIR derivative datasets, one set with the strongest anomalous signal or the highest resolution is selected to conduct the iterative direct-methods SAD/SIR phasing based on the starting known phases from step (ii).
(iv) Each cycle of the procedures consists of three parts: direct method phasing, density modification, and model building/refinement. Starting from the second cycle onwards, models from last cycle will be fed back to the current cycle. The known phases will be “fixed” in each cycle until the percentage of residues built has reached certain standard that users define (say about 80%).
In practice, SAD/SIR phase estimations are not as accurate as MAD/MIR because of the intrinsic phase ambiguity, while the direct-methods proposed by Fan and Gu[16] in 1985 was designed to overcome this problem by changing the 0–2π phase problem into making a choice between the sign of plus and minus. In addition, with the help of those high-quality and low-resolution known phases as starting point, the P+ formula deduced from this theory[16] will give a more accurate modulation to the bimodal phase probability distribution, thus strengthening the phasing power. The “fixed” in step (iv) means the known phases remain unchanged in the direct-methods phase calculation, and these phases are used to calculate the value of P+ instead of setting it to a constant value of 0.5. Please refer to Ref. [6] for more details.
So as it is, the procedure has achieved two things at one stock. The starting known phases from the MAD/MIR procedure at low resolution can greatly enhance the phasing power of direct methods, while the low-resolution MAD/MIR phases can be extended to higher resolution with the aid of SAD/SIR iteration. What is more, by choosing only one set of derivative data with strongest anomalous signal or highest resolution, the phasing result would be less affected by the imperfect isomorphism or the weak anomalous signals. The high-resolution derivative data will not be limited by other low-resolution datasets and can be used to its full potential.
The flowchart of the iterative direct-methods MAD/MIR phasing procedure is presented in Fig.
This procedure was originally proposed in 2007 as a direct-methods-aided partial-model extension without the needs of SAD/SIR information. By redefining the variables in the P+ probability formula proposed by Fan and Gu,[16] the phase extension process of finding a value in the range of 0–2π for each unknown phase is reduced into that of just making a choice between two possible values. It is more like a “phase-flipping” process, where the phases that differ much from the correct value will undergo a large shift, while the phases that are close to the correct value will remain unchanged. For more details, please refer to Ref. [15].
Usually, the direct-methods-aided partial-model extension starts with an incomplete model. However, the density map calculated at low resolution from stage 1 may not be sufficient enough to build a reliable structural model, so we first use the initial phases to build up a model against the structure amplitudes with high-resolution. Then the resulting initial model will be the starting information for the direct methods phasing in OASIS. At the same time, the initial phases can also be used as “known phases” that remain fixed in the iteration until the resultant model has grown to the standard defined by users (say about 80% of the whole structure).
The flowchart of the iterative direct-methods phase extension procedure is displayed in Fig.
All the tests stated in this section are conducted in the newly released IPCAS 2.0 pipeline. The programs and their usage in the tests are listed in Table
Three sets of protein data are used to test the methods, including the MAD data of the human K2P TRAAK channel
(PDB_ID: 3UM7),[20] the MAD data of the human BK channel Ca2+ gating apparatus (PDB_ID: 3MT5),[21] and the MIR data of the R-phycoerythrin (PDB_ID: 1LIA).[22] In addition, the derivative data sets of 3MT5 and 1LIA are manually truncated to 6.80 Å (original resolution: 3.30 Å) and 6.90 Å (original resolution: 3.00 Å). For all the three cases above, we use the previously deposited model in the PDB as the reference model in our test. Detailed data statistics are listed in Table
The human K2P TRAAK channel (3UM7) is a membrane protein. In the original work, the initial phases were calculated in SHARP via MIRAS method and the final model was built by iterative manual building.
During the test, the conventional MAD phases and heavy-atoms are firstly calculated by SHARP, then the MAD phases are truncated to 8.00 Å and regarded as the “known phases”. The diffraction data from the “peak” wavelength at 4.17 Å resolution along with the known phases are used to conduct the direct-methods MAD phasing for 20 cycles of iteration (IPCAS iteration control: OASIS+DM+AutoBuild). The resulting electron density matches well with the reference model (see Fig.
The human BK channel Ca2+ gating apparatus is also a membrane protein. In the original work, initial phases were calculated by MAD phasing and the model was generated and extended against native data by iterative manual building.
In the testing case, the heavy-atoms and MAD phases are firstly calculated by SHARP, then the MAD phases are truncated to 8.00 Å and regarded as the “known phases”. The truncated 6.80 Å “peak-wavelength” data along with the known phases are used to conduct the iterative direct-methods MAD phasing for 20 cycles of iteration (IPCAS iteration control: OASIS+AutoBuild). Finally, a backbone-structure has been generated with 41.8% of the total residues built from the resulting density map. The map matches reasonably with the reference model (see Fig.
The structure of R-phycoerythrin (Rpe) was originally determined via MIR method with four sets of derivative data and the native data.
In our test, the heavy-atoms and the conventional MIR phases are firstly calculated by SOLVE.[3] Then the MIR phases are truncated to 8.00 Å resolution and regarded as the “known phases”. The truncated 6.90 Å Au-derivative data and the native data along with the initial known phases are used to conduct the iterative direct-methods MIR phasing for 20 cycles of iteration (IPCAS iteration control: OASIS+AutoBuild). 71.2% of the total residues are auto traced by the resulting electron density map which has revealed the essential features of the reference model (see Fig.
For all the three testing cases above, we have compared the final figure-of-merit (FOM)-weighted mean phase error of the conventional MAD/MIR phasing, the iterative direct-methods SAD/SIR phasing, and the iterative direct-methods MAD/MIR phasing, respectively. The results of the iterative direct-methods MAD/MIR phasing are evidently better than the other two phasing methods, proving that it could further improve the conventional MAD/MIR phases. Detailed resulting information is listed in Table
Although the “sausage-like” electron density maps obtained in stage 1 have revealed some basic characteristics of the structure, the models are lack of accurate side-chain information owing to the low resolution of the diffraction data. In order to obtain more precise structures, the phase extension is conducted.
In the human K2P TRAAK channel case, we have failed to automatically extend the phase and model to higher resolution owing to the severe anisotropy displayed in the native data which is elliptically truncated and scaled to 3.80×3.30×3.80 Å. While for the other two cases, automated phase extensions are successfully achieved.
For the human BK channel Ca2+ gating apparatus case, the 6.80 Å phases from stage 1 are firstly used to build an initial model and refined by PHENIX.AutoBuild against the 3.00 Å native data. Then the model as well as the native data is used to conduct the direct-method phase extension for 17 cycles of iteration. The final model is nearly completed with Rwork/Rfree reaching 0.25/0.29. The results of cycles 0, 4, 10, and 17 are listed in Table
The case of 1LIA has a two-fold NCS operator. The 6.90 Å phases are firstly used to build an initial model and refined by PHENIX.AutoBuild against the 2.80 Å native data. Then the model as well as the native data is used to conduct the direct-method phase extension for 15 cycles of iteration. The final model is nearly completed with Rwork/Rfree reaching 0.25/0.28. The results of cycles 0, 4, 8, and 15 are listed in Table
We have made a comparison in phase extension between RESOLVE[25] in PHENIX with our direct-methods phase extension procedure. The phases after extension in RESOLVE are then delivered to PHENIX.AutoBuild for model building and refinement, and the building results of the two cases are listed in Tables
The iterative direct-methods MAD/MIR phasing procedure is proved to be reliable to improve the phase quality calculated from the conventional MAD/MIR methods at low resolution. The iterative direct-methods phase extension procedure is capable of extending phases automatically in two specific cases (case from 6.90 Å to 2.80 Å and case from 6.80 Å to 3.00 Å), both resulting in nearly completed models. The techniques proposed in this paper are useful and reliable when dealing with MAD/MIR data with derivative crystals diffracting to much lower resolution and native crystal diffracting to high resolution.
Furthermore, low-resolution phases from other sources such as molecular replacement (MR) template, cryoEM map, or even nuclear magnetic resonance (NMR) data can be possibly extended to high resolution by using the procedure described in this article. In addition, as the success of the direct-methods phase extension depends on the quality of the initial phases and the resolution of the high-resolution data, quantitative analysis is needed to investigate how these factors influence the final results. By accomplishing this, more applications in protein crystallography will be discovered via this low-resolution phase extension tool.
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
24 | |
25 |